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(57) ABSTRACT 

The present invention relates to an Inductive Monitoring 
System (IMS), its software implementations, hardware 
embodiments and applications. Training data is received, 
typically nominal system data acquired from sensors in 
normally operating systems or from detailed system simu- 
lations. The training data is formed into vectors that are used 
to generate a knowledge database having clusters of nominal 
operating regions therein. IMS monitors a system’s perfor- 
mance or health by comparing cluster parameters in the 
knowledge database with incoming sensor data from a 
monitored-system formed into vectors. Nominal perfor- 
mance is concluded when a monitored-system vector is 
determined to lie within a nominal operating region cluster 
or lies sufficiently close to a such a cluster as determined by 
a threshold value and a distance metric. Some embodiments 
of IMS include cluster indexing and retrieval methods that 
increase the execution speed of IMS. 

14 Claims, 10 Drawing Sheets 
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Fig. 7 
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Fig. 10 
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Fig. 11A 
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any range specified for that parameter in any cluster. 


1 

- 

r 218 


1 

Compare the adjusted monitored-system vector 
against the cluster database. 


Fig. 13 



Generate a cluster database using anomalous data for a 
given failure. 

Annotate that cluster database with the failure that was 
manifested to provide a diagnostic cluster database. 
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When off-nominal data is detected, compare the monitored- 1 
system vector to the to the diagnostic cluster database. 


Fig. 14 
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INDUCTIVE MONITORING SYSTEM 
CONSTRUCTED FROM NOMINAL SYSTEM 

DATA AND ITS USE IN REAL-TIME SYSTEM 
MONITORING 

5 

ORIGIN OF INVENTION 

The invention described herein was made by an employee 
of the United States Government and may be manufactured 
and used by or for the Government for governmental pur- 10 
poses without payment of any royalties thereon or therefor. 

BACKGROUND OF INVENTION 

1. Technical Field of the Invention 

This invention relates generally to the field of automated 
system monitoring and anomaly detection and, in particular, 
to methods of generating system monitoring knowledge 
bases from nominal system behavior, and to the use of these 
knowledge bases in monitoring system performance in real- 
time or near-real -time. 20 

2. Description of the Prior Art 

The modern information age provides great quantities of 
raw data concerning the performance of man-made engi- 
neered systems as well as data concerning the behavior of 
natural systems. Numerous information processing tech- 25 
niques have been employed to attempt to classify such data, 
look for anomalies, or otherwise assist humans to extract, 
understand and/or respond to information contained in the 
data. Examples of such techniques include model based 
reasoning, machine learning, neural networks, data mining, 30 
support vector machines, various decision tree models 
including ID3 decision tree learner, among many others. 
However, these techniques typically have one or more 
drawbacks that render them unsuitable or disfavored for 
some applications. 35 

For example, model based reasoning and related tech- 
niques typically require a detailed engineering simulation of 
the system under study, often including expert knowledge of 
system behavior, detailed behavior of system components 
and subsystems, detailed knowledge of interaction among 
system components and failure mechanisms, among other 40 
knowledge. Such knowledge may not be available for all 
components and subsystems. Furthermore, even when a 
reasonably accurate system simulation is available, it often 
requires impractical amounts of computer resources. That is, 
the simulation may execute too slowly to provide informa- 45 
tion in real-time or near-real time so as to be unsuitable for 
many practical system monitoring applications. In addition, 
the computer resources may not be available in space- 
limited or weight-limited environments such as space 
vehicles. Thus, a need exists in the art for computationally 50 
rapid techniques to monitor the performance of a system and 
detect anomalous behavior without the need for excessive 
computer resources. 

Some classification or decision models require that the 
system be trained with data that includes data derived from 55 
both normally -functioning systems (nominal data) as well as 
data derived from anomalous system behavior (off-nominal 
data). In many practical applications, off-nominal data is 
unavailable for training, and even the nominal data available 
for training may not fully explore all of the system’s nominal 
operating regimes. Thus, a fiirther need exists in the art for 60 
techniques to monitor a system’s performance that does not 
require off-nominal data for training. 

SUMMARY OF THE INVENTION 

65 

Accordingly and advantageously, the present invention 
relates to an Inductive Monitoring System (IMS), its soft- 
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ware implementations and applications. IMS builds one or 
more system monitoring knowledge bases that are subse- 
quently used to monitor system performance. IMS analyzes 
nominal system data and forms classes or clusters of 
expected system sensor values. These clusters are used in 
building a system monitoring knowledge base. IMS system 
monitoring knowledge bases can be constructed entirely 
from several sets of nominal sensor data, either gathered 
directly from the sensors of the system to be monitored 
during actual operation, or gathered from system simula- 
tions, or both. Thus, among the advantages of IMS is that 
off-nominal data is not needed to train IMS. Further advan- 
tages of IMS include avoiding the complexity and expense 
of constructing a detailed system model that may require 
many hours of expert time to create and/or significant 
computer resources to run. 

IMS monitors a system by comparing incoming sensor 
data with the clusters in the knowledge base to determine a 
“distance” (defined according to a suitable metric) from the 
incoming sensor data to the nearest knowledge base cluster. 
If the incoming sensor data lies sufficiently close to a cluster 
of the knowledge base derived from nominal system per- 
formance, the system is deemed to be functioning normally 
since it is sufficiently close to previous normal behavior. 
“Sufficiently close” is determined by threshold parameters 
supplied by the user to take into account that training data 
will not generally span the entire space of normal system 
operation, and the sensor data from nominally operating 
systems is inherently subject to a level of “noise,” causing 
deviations in data received from sensors even for identi- 
cally-operating systems. 

An important function of IMS is to monitor system 
performance (or the “health” of the system) in order to detect 
off-nominal performance, and not necessarily to provide 
thorough system diagnostics. However, limited diagnostic 
information may be available with IMS and can provide 
useful diagnostic information in some circumstances. For 
example, the amount by which off-nominal data fails to meet 
the threshold criteria (that is, the “distance” from the nearest 
cluster) can provide information as to the severity of the 
anomalous performance. In addition, the particular cluster 
closest to the suspicious system data can also provide useful 
guidance as to the possible nature of the anomalous perfor- 
mance (and/or exclude certain types of system malfunc- 
tions). 

Some embodiments of IMS include cluster indexing and 
retrieval methods that are shown to be advantageous for 
increasing the execution speed of IMS. Distances are deter- 
mined from the clusters of the knowledge database to one or 
more reference points. The clusters are sorted into a list 
according to these distance values, typically in ascending 
order of distance. When a set of input data arrives and is to 
be tested, such input data is arranged as an ordered set of 
numbers, or an input vector. The distance from the input 
vector to the reference point(s) is then computed. The search 
of clusters from the list can be limited to those clusters lying 
within a certain distance range from the input vector, thereby 
increasing the system monitoring speed. 

These and other advantages are achieved in accordance 
with various embodiments of the present invention as 
described in detail below. 

BRIEF DESCRIPTION OF THE DRAWINGS 

To facilitate understanding, identical reference numerals 
have been used, where possible, to designate identical 
elements that are common to the figures. 
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The techniques of the present invention can readily be 
understood by considering the following detailed descrip- 
tion in conjunction with the accompanying drawings, in 
which: 

FIG. 1 depicts a high-level block diagram of a typical 
embodiment of the inductive monitoring system. 

FIG. 2 depicts a diagram of a typical data vector used by 
the inductive monitoring system of FIG. 1 . 

FIG. 3 depicts an exemplary data vector as used by the 
inductive monitoring system of FIG. 1. 

FIG. 4 depicts a diagram of a typical cluster of the cluster 
database of FIG. 1 . 

FIG. 5 depicts an exemplary cluster having a pair of data 
vectors showing maximum and minimum values for the 
parameters. 

FIG. 6 depicts an example of a two-dimensional mini- 
mum bounding rectangle for an exemplary cluster. 

FIG. 7 depicts a flowchart of an embodiment of a tech- 
nique to generate a cluster database by the inductive learning 
module of FIG. 1. 

FIG. 8 depicts a flowchart of an embodiment of another 
technique to generate a cluster database which scales or 
normalizes at least a subset of the input parameter values of 
the input vector of the inductive learning module of FIG. 1. 

FIG. 9 depicts a flowchart of an embodiment of a tech- 
nique used in the system monitoring module of FIG. 1. 

FIG. 10 depicts a flowchart of a typical cluster indexing 
technique to organize clusters for faster system monitoring. 

FIGS. 11 A and 11B collectively depict a flowchart of an 
embodiment of a technique to access a cluster in a cluster 
database organized in accordance with the indexing tech- 
nique of FIG. 10. 

FIG. 12 depicts a flowchart of an embodiment of a 
technique using the system monitoring module to generate 
training data for another learning application. 

FIG. 13 depicts a flowchart of an embodiment of a 
technique to adjust erroneous parameters in the system 
monitoring module of FIG. 1. 

FIG. 14 depicts a high-level flowchart of an embodiment 
of a technique to generate a diagnostic cluster database in the 
inductive learning module and, in the system monitoring 
database, monitoring data using the diagnostic cluster data- 
base. 

FIG. 15 depicts a high-level block diagram of an exem- 
plary computer system that can be used for implementation 
of the inductive monitoring system. 

FIG. 16 depicts a schematic block diagram of the sub- 
system of the Hybrid Combustion Facility for oxygen vapor- 
ization and oxygen supply to the combustion chamber. 

FIG. 17 depicts a typical vector used by the inductive 
monitoring system to learn and then to monitor the perfor- 
mance of the oxygen supply subsystem of the Hybrid 
Combustion Facility. 

DETAILED DESCRIPTION OF THE 
INVENTION 

After considering the following description, those skilled 
in the art will clearly realize that the teachings of the 
invention can be readily utilized for monitoring the perfor- 
mance of systems, or system health, in real-time or near- 
real-time. 

The Inductive Monitoring System (IMS), in its various 
embodiments and implementations, is used to build or 
generate one or more “system monitoring knowledge bases 
or databases” also referred to as “knowledge bases or 
databases” or “cluster database(s)” without distinction. The 
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system monitoring knowledge bases can be generated from 
either actual sensor data gathered during system runs and 
typically stored as archival data, or from “pseudo -sensor 
data” generated by system simulations (if available), or a 
5 combination of both actual and simulated sensor data. IMS 
typically generates the monitoring knowledge bases from 
“nominal” system runs or simulations, that is from runs or 
simulations that exhibit normal performance without 
anomalous behavior. This is in contrast to many fault- 
10 detection or system health monitoring procedures in which 
anomalous or “off-nominal” system behavior is required. 

Another important objective of IMS is to provide tech- 
niques to automatically produce system monitoring knowl- 
edge bases for systems that are either difficult to model 
15 (simulate) or which require computer models that are too 
complex for use in real-time or near-real-time monitoring. 
IMS uses nominal data sets collected either directly from the 
system or from detailed simulations of the system to build a 
knowledge base that can then be used to detect anomalous 
20 behavior in the system. IMS “learns” typical system behav- 
ior by extracting general classes of nominal data from 
archived data sets and is thereby able to monitor the system 
by comparing real-time operational data with these classes 
in the knowledge base. 

25 FIG. 1 depicts a high-level block diagram of a typical 
implementation of IMS 20. Some embodiments of IMS can 
be summarized as follows, with reference to FIG. 1: A set of 
training data 22 is obtained from actual system sensors or 
simulated system performance. Nominal operating regions 
30 (or “clusters”) are extracted from the training data by IMS, 
typically operating through an inductive learning module 24, 
and stored in a system monitoring knowledge base, or 
cluster database 26. As data is acquired from sensors on an 
actual, operating, system, that is, monitored system data, 28, 
35 a system monitoring module 30 of IMS compares the data 
acquired from monitored system 28 with the clusters of the 
cluster database 26 to determine the monitored system’s 
performance or status 32. Thus, IMS learns system behavior 
and stores that knowledge in a database, which is compared 
40 with data from actual operating systems to monitor that 
system’s performance or “health.” 

FIG. 2 depicts a vector 40 or data vector. The data vector 
40 is the basic data structure of the IMS and comprises a set 
of N parameter values, P x 42 to 44. Each vector 40 is an 
45 ordered list of data parameters 42-44 (also referred to as 
vector elements or members). The data parameter values 
42-44 are collected from the training system or the moni- 
tored system by a data acquisition process, or produced by 
means of a system simulation. In another embodiment, the 
50 vectors 40 can contain derived parameter values computed 
from the collected data parameter values and/or parameter 
values derived from data acquired at different times (from 
different data “frames”). The values used in a given data 
vector may be collected simultaneously by the data acqui- 
55 sition system, or collected over a period of time. The user 
specifies the size and contents of the vector structure appro- 
priate for the monitoring application. 

FIG. 3 depicts an exemplary data vector 50. The name of 
each data parameter is shown above its value. For example, 
60 data parameter 52, Pressure A, has a value of 2857.2. The 
names of the other data parameters are Valve 1 Position 54, 
Pressure B 56, Valve 2 Position 58, Pressure C 60, Tem- 
perature 1 62 and Temperature 2 64. 

IMS is advantageously used for those cases in which it is 
65 particularly difficult to construct detailed system diagnostic 
models of the system (or some components thereof) due to 
the complexity or unavailability of design information. IMS 
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system monitoring knowledge bases, or cluster databases, 
can be constructed entirely from several sets of nominal 
sensor data, either gathered directly from the sensors of the 
system to be monitored during actual operation, or gathered 
from system simulations, or both. IMS analyzes the nominal 
system data and forms classes, or clusters, of expected 
system sensor values. These classes are used to build a 
system monitoring knowledge base. Data used to construct 
classes of expected sensor values and the system knowledge 
database are called “training data” whether obtained from 
actual system operation, simulations or both. 

FIG. 4 depicts a typical cluster 70. IMS “learns” the 
operating characteristics of the system by processing train- 
ing data sets containing nominal system data collected either 
from the monitored system itself or from an accurate simu- 
lation of the system, or both. IMS processes the training data 
by formatting the data into the predefined vector format and 
building a knowledge base containing clusters of related 
value ranges for the vector parameters. In some embodi- 
ments of the present invention, each cluster defines a range 
of allowable values for each parameter in a given vector. 

In cluster 70, a first cluster vector 72 comprises the values 
of the upper limit, that is, a high value, for the parameters, 
Pi high to P N,higfo 74 to 76, respectively; and a second cluster 
vector 78 comprises the lower limit, that is, a low value, for 
the parameters P lj/ow to 'P N j ow , 80 to 82, respectively. In one 
embodiment, a cluster defines a range of values for a subset 
of the parameters of a vector. Alternately, a cluster defines a 
range of values for each parameter of a vector. 

During system monitoring, the parameter values collected 
from the monitored system for a given vector are compared 
to the values of the clusters in the cluster database. If the 
clusters were generated from nominal data and if all the 
values of a vector from the monitored system fall within or 
near the parameter ranges defined by one of these clusters, 
that vector is deemed to contain nominal data. 

FIG. 5 depicts an exemplary cluster 90 for the parameters 
of the vector of FIG. 3. The name of each parameter is 
shown above the values. The first cluster vector 92 com- 
prises the high values or upper limits for respective param- 
eters. The second cluster vector 94 comprises the low values 
or lower limits for respective parameters. 

Each data vector can be considered a point in N-dimen- 
sional space where N is the number of parameters in the 
vector. As described above, a cluster defines an interval, that 
is, a range of possible values for the parameters in a vector. 
A vector representing a point contained in a cluster will 
contain a value for each parameter within the range specified 
in the cluster for the corresponding parameter. The high and 
low values for each parameter in the vectors in a cluster can 
be thought of as points that define the comers of an N-di- 
mensional “minimum bounding rectangle.” All points con- 
tained in a cluster will thus be contained inside or on an edge 
of that rectangle. FIG. 5 depicts an exemplary cluster 
showing minimum and maximum parameter values for the 
parameters of the vector of FIG. 3. 

FIG. 6 depicts an example of a two-dimensional mini- 
mum bounding rectangle 900 for an exemplary cluster. The 
cluster has two parameters, first and second parameters, P x 
and P 2 , respectively. The first parameter P x is plotted on the 
x-axis 920. The second parameter P 2 is plotted on the y-axis 
940. A first cluster vector, that defines the upper limits, 
contains parameter values (P lfhigh , P 2 ,hi g h) an d forms an 
upper corner 960 of the minimum bounding rectangle 900. 
A second cluster vector, that defines the lower limits, con- 
tains parameter values (P 1>/ow , P 2jW ) and forms a lower 
corner 980 of the minimum bounding rectangle 90. The 
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other corners, 100 and 102, of the minimum bounding 
rectangle 900 are defined by the (P 1>w V 2Mgh ) and (P 1Jtigh , 
P 2 ,ioJ, respectively. 

FIG. 7 depicts a flowchart of an embodiment of the 
5 generation of a cluster database by the inductive learning 
module 24 of FIG. 1. In step 110, the inductive learning 
module typically begins the training process with an empty 
cluster database. In step 112, the inductive learning module 
determines if any training data is available. If not, in step 
to 114, the process ends. If, in step 112, the inductive learning 
module determines that training data is available, in step 
116, the inductive learning module reads the training data 
and formats the training data into an input vector. Alternately 
the inductive learning module receives real-time training 
15 data. In step 118, the inductive learning module determines 
if the cluster database is empty. If so, in step 120, the 
inductive learning module forms a cluster based on the 
values of the input vector, inserts the cluster into the cluster 
database, and proceeds to step 112. In one embodiment, to 
20 form the cluster, the inductive learning module adds prede- 
termined high initialization values to respective parameter 
values of the input vector to form a high value for each 
parameter of the cluster, and subtracts predetermined low 
initialization values from respective parameters of the input 
25 vector to form a low value for each parameter of the cluster. 
When a new cluster is formed from a single input vector, it 
is frequently advantageous to expand the parameter values 
of the input vector to accommodate for data uncertainty 
and/or to provide for a more general initial cluster definition. 
30 If, in step 118, the inductive learning module determines 
that the cluster database is not empty, in step 122, the 
inductive learning module determines which cluster in the 
cluster database has parameter values closest to the input 
vector. “Closest” is defined in terms of a defined “distance” 
35 between the input vector and the cluster. To determine the 
distance, a variety of definitions for a distance metric can be 
used. In one embodiment, the standard Euclidean distance 
metric determines a distance D between the input vector and 
a selected point in the cluster as follows in Eq. 1 : 

40 


D{X,Y) = Xi-Yi ] 2 
45 

In which the summation is over all N parameters in each 
vector X and Y. 

The distance metric, whether defined by Eq. 1 or an 
50 alternative, typically determines the distance between two 
vectors or points in the N dimensional space. Since a cluster 
will generally contain more than one vector, determining the 
distance from a vector to a cluster involves generalizing the 
application of a distance metric to accommodate a multi- 
55 vector cluster. Several procedures can be used. A cluster 
reference point within the cluster can be selected and all 
“distances to the cluster” defined to be the distance to the 
cluster reference point. A centroid value can be computed for 
each parameter of the vectors in the cluster and this centroid 
60 point used as the cluster reference point. Other weighted 
averages of the parameter values or mean parameter values 
of each vector in the cluster can also be used to determine 
a cluster reference point. Specific examples are described 
below. 

65 The concept of “distance to a cluster” is not restricted to 
the use of a single cluster reference point with “distance” 
determined according to a distance metric from the input 
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vector to the reference point. The goal of such a calculation 
is to provide a quantitative measure of “nearness” of the 
input vector to various clusters in the database such that, in 
generating the cluster database, the input vector can be 
assigned to an existing cluster or used to initiate another 5 
cluster. For system monitoring, the “nearness” definition 
should be capable of distinguishing normal from anomalous 
system behavior. These criteria can be fulfilled by a “dis- 
tance” concept involving several vectors of the cluster as 
well as the use of a “distance” to a single exemplary cluster to 
reference point. For example, in some embodiments of the 
present invention, the parameter values of the input vector 
are compared to the upper and lower parameter values of the 
vectors of the cluster and the differences are summed (in 
some embodiments, taking account of algebraic signs of the 15 
differences to account for input vectors lying within a 
previously-determined cluster). Such a multi-vector “dis- 
tance” concept is used in the example of the Hybrid Com- 
bustion Facility described herein. 

Using the Euclidian distance metric of Eq. 1, the param- 20 
eter values of the input vector are represented by X. and the 
respective parameter values of the cluster reference point are 
represented by Y z .. 

The cluster reference point can be selected in a variety of 
ways. For example, in one embodiment, the cluster reference 25 
point is taken to be the cluster centroid, which is defined as 
the average of the high and low values for each parameter in 
the vectors contained in the cluster. Referring to FIG. 6, 
point 104 represents the centroid. In another embodiment, 
the cluster reference point is defined as the cluster mean 30 
vector constructed using the mean value of every parameter 
contained in the cluster. Yet another embodiment defines a 
cluster reference point as that vector in the cluster having 
parameter values as close as possible to each corresponding 
parameter value in the input vector. Using this embodiment, 35 
in FIG. 6, point 100 is the closest point in the cluster to the 
input vector 106, and the distance D is determined between 
points 100 and 106 using the Euclidian distance formula 
above. Based on the distance from the input vector to the 
cluster reference point, D, the inductive learning module 40 
selects the cluster with the shortest distance D to the input 
vector as the closest cluster. 

Other techniques can also be used for selecting the cluster 
reference point. Different choices for cluster references 
points can affect the performance of IMS by altering the 45 
parameter tolerances in individual clusters and the number 
of clusters in the final IMS knowledge base. For example, 
determining the distance between a vector and the closest 
point in a cluster rather than the centroid would typically 
result in shorter distances and be likely to incorporate more 50 
training vectors into fewer clusters rather than create a larger 
number of distinct clusters. Conversely, determining dis- 
tance from the input vector to the furthest point in the cluster 
would be likely to result in a larger number of smaller 
clusters in the database. Smaller clusters (i.e., a smaller 55 
distance between upper and lower bounds) would typically 
provide tighter monitoring tolerances, which may not be 
desirable on systems with low accuracy or noisy sensors. 
Also, larger numbers of clusters in the database could 
increase the search time to identify the “nearest” cluster to 60 
an input vector, which may be disadvantageous for moni- 
toring a system with a high data rate or monitoring with a 
slower computer. A balancing of database size with moni- 
toring tolerance for the particular application is typically 
desirable. 65 

After determining the cluster that is closest to the input 
vector, in step 124, the inductive learning module deter- 
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mines if the input vector is contained within the closest 
cluster of step 122. If so, the inductive learning module 
proceeds to step 112. 

If, in step 124, the inductive learning module determines 
that the input vector is not contained within the closest 
cluster, in step 126, the inductive learning module deter- 
mines whether the parameter values of the input vector are 
close enough to include that input vector in the closest 
cluster. A “cluster-expansion-distance threshold value,” that 
is specified by a user, defines a maximum distance between 
the input vector and the closest cluster such that the input 
vector will be incorporated into the cluster. If the distance 
from the input vector to the closest cluster, determined 
above, is not greater than the specified cluster-expansion- 
distance threshold value, in step 128, the inductive learning 
module expands the closest cluster to include the input 
vector. In other words, the cluster parameter ranges are 
expanded to include the input vector in the cluster, thereby 
redefining the extent of the closest cluster. Step 128 proceeds 
to step 112. 

In another embodiment of step 128, the cluster parameter 
ranges are expanded more than the minimum values that 
would include the input vector in the cluster. Such super- 
minimum parameter expansion is a useful procedure for 
allowing for uncertainty in the data measurements and/or to 
further generalize the training examples. The desired expan- 
sion factors can also be specified by the user. For example, 
the range for each updated parameter in the cluster can be 
expanded by the difference between the value of the param- 
eter in the input vector and the nearest value to the input 
vector, plus a percentage of that value. A 2 % expansion 
factor is adequate in many cases, particularly in cases with 
fairly focused clusters. One could also expand by a percent- 
age of the difference between the parameter in the input 
vector and the cluster centroid. In addition, one could use an 
expansion factor to compensate in an approximate way for 
the estimated accuracy of the particular sensor generating 
the parameter. That is, if a sensor is known to be accurate to 
approximately 5%, one can add or subtract 5% for the value 
of that parameter in the vector. Other correction procedures 
can also be applied depending on the particular character- 
istics of the system under study and the monitoring appli- 
cation. 

If, in step 126, the inductive learning module determines 
that the parameter values of the input vector are not suffi- 
ciently close to include the input vector in the cluster, the 
inductive learning module proceeds to step 120 to form the 
input vector into a new cluster. In other words, if the distance 
between the input vector and the closest cluster is greater 
than the cluster-expansion-distance threshold value (possi- 
bly including any expansion factors), a new cluster is 
generated based on the input vector. 

It is often advantageous to scale or normalize the data 
values before they are inserted into the vectors. Arbitrary 
choices for units of measurement can cause some data 
values to be represented by large numbers while other data 
values are represented by small numbers, tending to skew 
calculations by over-weighting the large numbers. This is 
frequently undesirable and can be ameliorated or avoided by 
normalizing data values. Several normalization procedures 
can be employed. 

FIG. 8 depicts a flowchart of an embodiment of the 
inductive learning module which scales or normalizes at 
least a subset of the input parameter values of the input 
vector. FIG. 8 is the same as FIG. 7 except for block 132; 
therefore the differences will be described. Step 116 reads 
the data to fill an input vector. In step 134, the inductive 
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learning module scales at least a subset of the input data of 
the input vector. In another embodiment, represented by 
block 132, steps 116 and 134 are combined to scale the input 
data as it is read and before it is stored in the input vector. 

For example, in one embodiment, each parameter is 
scaled to represent a percentage (or fraction) of a maximum 
range for that parameter, thereby causing every data value, 
d z , of the vector to lie in the range 0^d z ^ 100.0 (or 
O^d^l.OO). This normalization procedure can be used to 
give the user a more intuitive understanding of the moni- 
toring knowledge database and the significance of any 
off-nominal system behavior encountered during system 
monitoring. 

In another embodiment, to scale the data, parameter 
values are weighted in comparison to other parameters, 
rather than being normalized to provide each parameter with 
substantially equal weight. For example, scaling a parameter 
to have a larger possible range relative to other parameters 
in the vector will tend to amplify any deviations in that 
parameter. 

In addition to those described herein, other combinations 
of data normalization and distance metrics may be useful for 
various situations and can be empirically determined with 
typical training data and, in some embodiments, assumed 
off-nominal system data. 

After IMS processes all of the training data, the result is 
a database of clusters (the system monitoring knowledge 
base) that characterizes system performance in the operating 
regimes covered by the training data. Each cluster defines 
one set of constraints on the values allowed for each 
parameter in any particular monitoring input vector. If there 
is no cluster in the monitoring knowledge base that contains 
a given input vector or is “near” that input vector, then the 
system is behaving in an unexpected manner indicating a 
possible system anomaly. 

In another embodiment, IMS comprises a system moni- 
toring module 30 (FIG. 1) that monitors a system by 
comparing incoming data with the clusters in the cluster or 
knowledge database to determine the “distance” (defined 
below) from the incoming sensor data to the nearest knowl- 
edge base cluster (or clusters). If the incoming sensor data 
lies sufficiently close to a cluster of the knowledge base 
derived from nominal system performance, the system is 
deemed to be functioning normally since it is sufficiently 
close to previous normal behavior. “Sufficiently close” is 
determined by threshold parameters supplied by the user to 
take into account that training data will not generally span 
the entire space of normal system operation, and the sensor 
data from nominally operating systems is inherently subject 
to a level of “noise,” causing deviations in data received 
from sensors even for identically -operating systems. 

In the event an incoming data vector has the same 
minimum distance with respect more than one cluster, that 
is, a tie, several procedures are possible. In some embodi- 
ments, the primary result of system monitoring is the dis- 
tance to the nearest cluster from which a decision is made 
“nominal or off-nominal,” in which case it does not matter 
which of two or more equidistant nearest clusters are con- 
sidered. Ties become significant if the system monitoring 
procedure further examines the relationship of the input data 
vector to the nearest clusters (such as deriving information 
from the degree of deviation of individual parameters), or if 
clusters are distinguished among themselves (e.g. nominal 
vs failure, or clusters deriving from different operating 
modes). Distinguishing nearest neighbor ties in such cases 
typically depends on the details of the particular application 
under study. 
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An important function of IMS is to monitor system health, 
and IMS is not primarily intended to provide thorough 
system diagnostic information for off-nominal performance. 
However, some diagnostic information is available with 
5 IMS that can, in some circumstances, provide useful guid- 
ance. For example, the amount by which off-nominal data 
fails to meet the required threshold criteria (that is, the 
“distance” from the nearest cluster) can provide information 
as to the severity of the anomalous performance. In addition, 
to the particular cluster closest to the suspicious system data 
can also provide useful guidance as the possible nature of the 
anomalous performance (and/or exclude certain types of 
system malfunctions). 

If IMS detects off-nominal system performance, it can 
15 respond in at least one or more of the following ways: a) 
Alert the operator to the suspicious data, b) Activate system 
diagnostic procedures and/or software, c) Alter the mode of 
system operation in response to the suspicious data includ- 
ing initiating emergency shut-down, d) Determine the 
20 degree of “off-nominal” behavior and/or the nearest class 
defining nominal performance. Select one or more responses 
based on the results of this determination. 

Automatic system health monitoring can significantly 
benefit from an accurate characterization or model of 
25 expected system behavior, that is “nominal” behavior. 
Among the advantages of IMS is the avoidance of difficul- 
ties often encountered in producing detailed health moni- 
toring and/or diagnostic models of some systems and/or 
components, typically arising from the complexity and/or 
30 the unavailability of design information. Many current 
health monitoring schemes simply monitor system param- 
eters one-by-one to ensure they do not exceed predetermined 
extreme thresholds. Such monitoring systems may not be 
able to detect early signs of anomalous behavior not involv- 
35 ing the beyond-threshold excursion of any one parameter. 

Some monitoring systems utilize an “expert” knowledge 
base or a detailed system model to provide tighter monitor- 
ing tolerances. Such techniques may not be feasible when 
system complexity and/or lack of resources (computing or 
40 otherwise) makes it difficult to develop such detailed mod- 
els. In addition, even when such an expert knowledge base 
or detailed system model can be constructed, it is frequently 
too complex for feasible computer processing in real-time. 
It is desired in many applications that system health be 
45 monitored rapidly as the system is in service to detect and 
commence ameliorating action before off-nominal behavior 
becomes irretrievable or catastrophic. Examples are pre- 
sented herein in which IMS responds in sub -millisecond 
times when monitoring the health of a complex engineered 
50 system (in particular, NASA’ s Hybrid Combustion Facility). 

Other technologies, such as neural networks and decision 
trees, have been applied to the monitoring of complex 
systems attempting to overcome the modeling difficulties 
noted above. Such techniques typically suffer from the 
55 limitation of requiring both nominal and off-nominal train- 
ing data in order to produce a feasible monitoring system. In 
addition, such monitoring systems typically produce system 
“models” that are difficult for humans to interpret. Since 
off-nominal system data is frequently difficult to obtain, IMS 
60 is designed to avoid this difficulty by building a system 
monitoring knowledge base entirely from nominal system 
data. The resulting IMS knowledge base clearly shows 
relationships between system parameters during normal 
operation and, in most cases, is easily processed to provide 
65 real-time (or near real-time) monitoring ability. 

FIG. 9 depicts a flowchart of an embodiment of the 
system monitoring module 30 of FIG. 1. In step 140, the 
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system monitoring module receives an input vector, also 
referred to as a monitored- system vector. In one embodi- 
ment, the system monitoring module formats real-time (or 
near-real-time) input data into monitored system vectors. In 
another embodiment, the system monitoring module scales 5 
and/or normalizes the monitored- system vectors as 
described above with reference to FIG. 8. In step 142, the 
system monitoring module determines whether any cluster 
contains the monitored- system vector. If so, in step 144, the 
system monitoring module indicates that the system is to 
operating within nominal parameters and proceeds to step 
140. If, in step 142, the system monitoring module deter- 
mines that no cluster contains the monitored- system vector, 
in step 146, the system monitoring module queries the 
cluster database to determine which cluster is closest to the 15 
monitored-system vector. 

In step 148, the system monitoring module determines 
whether the distance between the monitored-system vector 
and the closest cluster exceeds a predetermined tolerance. In 
some embodiments, the system monitoring module is 20 
allowed to account for incomplete training data or measure- 
ment inaccuracies by setting a tolerance on the maximum 
allowable distance between the monitored-system vector 
and the closest cluster for the monitored-system vector to be 
considered nominal data. That is, the input monitored- 25 
system vector is “close enough” to its closest cluster to be 
considered nominal. If, in step 148, the distance is less than 
or equal to the predetermined tolerance, in step 150, the 
system monitoring module indicates that the monitored 
system may be operating properly. If, in step 148, the 30 
distance is greater than the predetermined tolerance, in step 
152, the system monitoring module indicates that the moni- 
tored system is operating outside nominal parameters. 

The system monitoring module of IMS can also, in some 
embodiments, make use of multiple tolerance values 35 
(“TVs”) to determine the level of alert. An input vector 
outside the tightest tolerance TV 0 may justify that an alert be 
issued to the system operator. Input vectors lying outside 
increasingly generous tolerances, TV l5 TV 2 , TV 3 , . . . 
(TV o <TV 1 <TV 2 <TV 3 . . . ) could cause IMS to alert the 40 
system operator to the possibility of increasingly severe 
system malfunctions. For monitoring systems requiring 
extremely fast response times, a single distance tolerance of 
0 may be appropriate (i.e., 0=TV o =TV 1 =TV 2 =TV 3 . . . ). 
This allows queries to the knowledge base to check only 45 
whether or not an input vector is contained in a cluster, 
thereby saving the time that would be required to compute 
the vector’s distance from each cluster examined. 

Some embodiments of IMS can gauge general system 
health or track general system behavior over time. For 50 
example, a larger vector can be formed by concatenating two 
or more consecutive sets of sensor data and this larger vector 
used with IMS techniques. Some embodiments of IMS can 
monitor the distance from nominal clusters of input vectors 
received during the time interval of interest, including the 55 
information that input vectors lie within a cluster, outside a 
cluster but within the tightest tolerance, the identity of the 
nearest cluster, among other attributes. Time dependent 
behavior of such distance data can be displayed in graphical 
form for the operator, transmitted to a diagnostic program 60 
for more detailed analysis, stored for future display and/or 
analysis, among other processing options. Another tech- 
nique by which IMS can capture time dependent behavior 
over shorter time scales is to form vectors from several 
consecutive sets or frames of sensor data. That is, one vector 65 
includes data from several (or a great many) data readings 
collected from the same sensors. For example, successive 
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sensor readings can be summed and divided by the number 
of readings to construct an average value for that particular 
parameter. Median values can also be accumulated, among 
other approaches to collecting time-accumulated sensor 
data. This approach provides different information from 
collecting and analyzing IMS output data over time, and 
provides an important and useful technique for monitoring 
system behavior. 

Indexing and Retrieval 

An efficient cluster indexing and retrieval method is 
advantageous in increasing the speed of IMS and allowing 
IMS to operate in real-time or near-real -time. In order to 
allow searching the IMS cluster database for the closest 
cluster, some embodiments of the indexing and retrieval 
scheme include a distance metric by which “closest” is 
defined and the ability to return the record of the cluster that 
is closest to the query point (input vector), not limited to 
those clusters containing the query point. The speed of 
search and retrieval should also be sufficiently fast so as to 
keep up with the anticipated rate of data acquisition. An 
efficient indexing and retrieval scheme can also help to 
increase the speed of the initial IMS training process, since 
training also performs “closest cluster” queries. 

Various embodiments of the cluster indexing and retrieval 
methods used in connection with the present invention make 
use of one or more indexing reference points. It is conve- 
nient to describe first those embodiments employing a single 
indexing reference point and then describe generalizations to 
those embodiments using multiple indexing reference 
points. 

FIG. 10 depicts a flowchart of typical embodiments of a 
cluster indexing technique employing a single indexing 
reference point. In some embodiments, the cluster indexing 
technique is part of the inductive learning module. In other 
embodiments, the cluster indexing technique is implemented 
in a separate module. The cluster indexing and retrieval 
technique typically makes use of the cluster distance con- 
cept. In step 160, an indexing reference point is selected. The 
choice of the indexing reference point is substantially arbi- 
trary and can be chosen for convenience of calculation. For 
example, the indexing reference point can be the origin (all 
parameter values are zero), centrally located to many clus- 
ters, among other choices. 

In step 162, a distance, referred to as the ordering dis- 
tance, from the indexing reference point to each cluster is 
determined. As noted above, “distance to a cluster” can be 
determined from the indexing reference point to a cluster 
reference point selected for each cluster, or making use of 
maximum and minimum values of the distance from the 
indexing reference point to cluster members, among other 
methods. The ordering distance may be determined using the 
Euclidian distance formula described above, among others. 
Any of the techniques above for determining a cluster 
reference point may be used. 

In step 164, the clusters are then sorted in accordance with 
their associated ordering distances. In step 166, the clusters 
are organized into a data structure, such as, for example, a 
list or search tree structure, based on the ordering distances. 
For example, the origin, in which all vector parameters are 
equal to zero, can be selected as the indexing reference 
point, and the minimum distance between the origin and 
each cluster is used as the cluster index. The clusters can 
then be sorted into a list with ascending index (ordering 
distance) value. 

In other embodiments, the list of clusters can be 
assembled into a plurality of sublists, typically determined 
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by a range of distances from the indexing reference point to 
the cluster. For example, one sublist can be constructed 
containing clusters from 0 to 100 distance units away from 
the indexing reference point. A second sublist having clus- 
ters from greater than 1 00 to 200 distance units away from 
the indexing reference point, and so forth until all clusters 
are included in at least one sublist. As described in detail 
below, multiple indexing reference points can also be 
employed, in which case multiple sublists will result, and a 
cluster can appear in more than one sublist. 

FIGS. 11 A and 11B collectively depict a flowchart of a 
typical technique to access or to locate the cluster closest to 
a given input vector. The inductive learning module and/or 
the system monitoring module may use the technique of 
FIGS. 11 A and 11B. The technique will be described in the 
context of the system monitoring module. 

In step 170, the system monitoring module receives a 
vector, V. In step 172, the system monitoring module deter- 
mines the distance Dv between the predefined indexing 
reference point that was used to index the clusters and the 
vector V. In step 174, an initial cluster Cc is selected from 
the cluster database. In some embodiments, cluster Cc is 
close to the vector V. Cluster Cc can be selected at random 
but it is advantageous if Cc is reasonably close to vector, V. 
In step 176, the distance Cv from the cluster Cc to V is 
determined. 

If the clusters are organized into sublists, the sublists can 
be advantageously employed to help in the selection of a 
cluster Cc close to the input vector V. For example, the initial 
cluster Cc is conveniently chosen as the first cluster in that 
sublist that covers the distance range including the distance 
of the input vector from the indexing reference point. 

It can then be shown that any cluster in the database that 
is closer to the vector V than Cc must include points that are 
a distance between MAX{0, (D v -C v )} and (D v +C v ) away 
from the indexing reference point. For economy of notation 
we will use the expression Dv-Cv herein, understanding that 
if this expression is negative, it is taken to be zero. A proof 
of this proposition is included in the Appendix, incorporated 
herein by reference in its entirety. Thus, an efficient search- 
ing procedure involves beginning the search at the beginning 
of the list and computing distances from the vector V to 
clusters that cover any portion of the distance range of 
(D V ±C V ). If a cluster is found that is closer to V than Cc, say 
Ck, adjust C v to reflect the distance from vector V to cluster 
Ck and continue the search. If a cluster is found that contains 
V, stop the search and return that cluster. Otherwise, con- 
tinue the search in ascending index order until the minimum 
distance from vector V to the current cluster is greater than 
(D v +C v ) and return the closest cluster located thus far. 

In step 178, a counter, i, is set equal to one to 

begin at a first cluster Cl on a list of clusters. Step 180 
determines if cluster Ci contains the vector V. If so, in step 
182, the search ends. If not, step 184 determines if a cluster 
Ci includes points that are a distance between (Dv-Cv) and 
(Dv+Cv) away from the indexing reference point. If so, in 
step 186, a minimum distance Di between the vector V and 
the cluster Ci is determined. Step 188 determines if Di is less 
than Dv. If so, in step 190, Cv is adjusted to reflect the new 
distance. In other words, Cv is set equal to Ci. The flowchart 
continues to FIG. 11B. Step 192 determines if there are more 
clusters in the list to check. If so, step 194 increments i by 
one to point to the next cluster, and proceeds to step 180. If 
step 192 determined that there are no more clusters in the list 
to check, in step 196, the process ends. 
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If step 184 determines that a cluster Ci does not include 
points that are a distance between (Dv-Cv) and (Dv+Cv) 
away from the indexing reference point, the process pro- 
ceeds to step 192. If step 188 determines that DiiiDv, the 
5 process proceeds to step 192. 

When the clusters are organized into sublists on the basis 
of distance, a different search strategy can be advanta- 
geously employed. The search begins by searching the 
sublist that includes the distance from the input vector to the 
to indexing reference point, typically following the methods 
described in FIGS. 11 A and 11B. If a match is found (that is, 
a cluster contains the input vector), the search terminates and 
the cluster found to contain the input vector is returned as the 
result of the search. If no match is found, adjacent (in 
15 distance) sublists are searched in the same manner if they 
cover any region of distance in the range Dv±Cv. If no 
sublists satisfy these criteria, the most recently located 
cluster nearest to the input vector is returned. The search 
continues until all sublists covering the distance Dv±Cv 
20 have been searched. 

The present indexing and retrieval schemes are not lim- 
ited to a single indexing reference point, m, but can employ 
multiple indexing reference points, m, n, o, p . . . In these 
cases, the process of narrowing the cluster search is iterative. 
25 Initially, a set of clusters is selected for searching whose 
members are those clusters that lie within a distance 
Dv m ±Cv of the first indexing reference point, m. From this 
set of clusters, the “m set,” a second set of clusters is chosen 
whose members also lie within a distance Dv w ±Cv of 
30 reference point n, the “mn set.” This process proceeds, 
producing sequentially “mnop . . . sets” of clusters until all 
reference points have been used. Since the procedure is 
essentially an iterative process of set intersection, at each 
stage the size of the set of clusters to be searched will 
35 typically decrease but may stay the same size. The cluster 
search is thus narrowed to the (typically) smaller set of 
clusters, for example, the mno set for the case of three 
indexing reference points. 

These indexing and retrieval schemes (including varia- 
40 tions with more than one reference point and different search 
data structures) have proven to be effective with IMS system 
monitoring, but other schemes can be adapted as warranted 
by the deployment and size of the cluster database. 

Additionally, to limit database size and increase query 
45 efficiency, a separate IMS cluster database could be gener- 
ated for different modes of system operation, and the appro- 
priate database queried when the system is operating in the 
mode appropriate for the particular knowledge base. For 
example, rather than have a single knowledge base for 
50 monitoring the performance of a helicopter, one could 
construct separate databases for hovering, idling, climbing, 
and other modes of operation, and query the appropriate 
database for the particular mode of helicopter operation. 
This embodiment of IMS involves pre-selecting a knowl- 
55 edge database to query from among a plurality of knowledge 
databases on the basis of one or more system parameters. 
Query efficiency can be additionally increased by making 
use of sublists as described above. 

As noted, in another embodiment, selecting a monitoring 
60 distance tolerance of zero can increase monitoring speed. 
This embodiment eliminates the calculation of distances 
between the input vector and database clusters. Zero dis- 
tance tolerance also allows the use of more efficient search 
structures. Any search technique that is able to determine if 
65 an input vector is contained in a cluster is appropriate. A 
decision tree structure similar to those produced by Quin- 
lan’s ID3 system can provide efficient monitoring perfor- 
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mance in this case. The branch points of the decision tree 
would be determined by the parameter ranges represented in 
the IMS clusters. The use of sublists can further increase 
query efficiency. 

Other techniques can be used with IMS to build decision 
trees, artificial neural networks, support vector machines, or 
other structures that use training sets including both nominal 
and off-nominal data in order to generate a system moni- 
toring knowledge base. IMS can be used to generate data for 
these types of systems, among others, using only nominal 
training data. It may also be desirable to convert an IMS 
knowledge base into one of these other forms to facilitate 
integration with an existing monitoring system, or to provide 
a more compact knowledge representation. 

FIG. 12 depicts a flowchart of an embodiment of a 
technique for using the system monitoring module to gen- 
erate training data for another learning application, typically 
non-IMS. To produce training data suitable for these other 
techniques, one can submit a wide variety of data vectors to 
an already -generated IMS knowledge base (generated with 
nominal data) and record the results. In step 200, data 
vectors are received having both nominal and off-nominal 
data values. In one embodiment, the nominal and off- 
nominal data values may be evenly distributed in the vector 
space, or, alternately, strategically selected for effective 
training of the secondary (non-IMS) learning application. In 
step 202, the data vectors are compared against the cluster 
database to provide a comparison-result for each data vector. 
In step 142, the data vector-comparison-result pairs are 
supplied as training input data to the secondary learning 
application to produce a desired monitoring knowledge 
base. In one embodiment, for example, the comparison- 
result indicates whether the data vector is in a cluster, that is, 
has nominal parameter values, or not in a cluster, that is, has 
off-nominal parameter values. In another embodiment, 
depending on the secondary learning application, the 
numeric IMS vector distance values are used as the “com- 
parison-results” for each input vector. In yet another 
embodiment, grouping thresholds are defined to group 
ranges of distances values into classes and a class identifier 
is also supplied with the “comparison-results” for training 
the secondary learning application. 

When monitoring with IMS, it is possible for a monitoring 
parameter contained in the IMS cluster database to become 
irrelevant or unusable for the monitoring task. For example, 
a system sensor may fail and give erroneous readings that 
cause false alarms. One way to deal with this situation is to 
redefine the parameter vector and regenerate the cluster 
database without the unreliable parameter(s). In another 
embodiment, the monitoring search technique adjusts the 
parameter values so that parameters that rely on the faulty 
input data will always match any range specified for those 
parameters in any cluster. This effectively ignores the erro- 
neous data at the expense of somewhat less accurate moni- 
toring. The distance metric or monitoring tolerances may be 
adjusted as appropriate to account for changes caused by 
parameter exclusions. 

FIG. 13 depicts a flowchart of an embodiment of a 
technique to adjust erroneous parameters of the system 
monitoring module. In step 210, a monitored-system vector 
is received. Step 212 determines if any parameter of the 
monitored-system vector is erroneous. For example, the 
parameters are compared to respective predefined opera- 
tional-input ranges, and if a parameter is outside its opera- 
tional-input range, that parameter is in error. If not, in step 
214, the monitored-system vector is compared against the 
cluster database. If so, in step 216, the erroneous parameter 
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(s) is adjusted such that the parameter(s) will match any 
range specified for that parameter in any cluster to provide 
an adjusted monitored-system vector. In step 218, the 
adjusted monitored-system vector is compared against the 
5 cluster database. 

In addition to scaling input parameters, it is often advan- 
tageous to scale the IMS output to represent a percent 
deviation from the nominal value, or other value readily 
interpreted by a human operator. For example, if the range 
to of distances available in the parameter space is from 0 to 
94000 and IMS returns a distance to the nearest cluster of 
940, “1% deviation” is likely to provide more meaningful 
information to the operator than the raw data of “940 
deviation.” 

15 Several courses of action could be taken if a query to the 
IMS finds that a data vector is suspect during system 
monitoring. For instance, the anomaly could be logged, an 
operator could be notified, or an automated diagnostic 
routine could be invoked. It is also possible to use the IMS 
20 techniques to produce diagnostic databases if there are data 
sets available that represent known system anomalies. A 
system simulator that allows simulated failures or data from 
specific system failure tests are useful for producing IMS 
diagnostic databases. One could train IMS using the anoma- 
25 lous data for a given failure in the same way that it is trained 
using nominal data. When off-nominal data is detected by 
IMS, the suspect vectors can be compared to the diagnostic 
cluster databases. If there is a close match with a particular 
database, then the failure represented by that database is a 
30 candidate for the diagnosis. 

FIG. 14 depicts a high-level flowchart of an embodiment 
of a technique to generate a diagnostic cluster database in the 
inductive learning module and monitoring data, in the sys- 
tem monitoring module, using the diagnostic cluster data- 
35 base. In step 220, the inductive learning module generates a 
cluster database using anomalous data for a given failure. In 
step 222, the inductive learning module annotates that 
cluster database with the failure that was manifested to 
provide a diagnostic cluster database. In step 224, when 
40 off-nominal data is detected, the system monitoring module 
compares the monitored-system vector to the diagnostic 
cluster database to attempt to identify the failure. If a 
monitored-system vector is within, or sufficiently close to, a 
particular cluster in the diagnostic cluster database, a user 
45 could be informed of the annotation describing the failure 
associated with the particular cluster, among other possible 
actions. 

FIG. 15 depicts an illustrative computer system 250 that 
utilizes the teachings of the present invention. The computer 
50 system 250 comprises a processor 252, a display 254, input 
interfaces 256, communications interface 258, memory 260, 
and output interfaces 262, all conventionally coupled by one 
or more busses 264. The input interfaces 256 comprise a 
keyboard 266 and mouse 268. The output interface 262 is a 
55 printer 272. The communications interface 258 is a network 
interface card (NIC) that allows the computer 250 to com- 
municate via a network, such as the Internet. Sensors 274 are 
also coupled to the processor 252. The sensors 274 supply 
real-time input data. 

60 The memory 260 generally comprises different modali- 
ties, illustratively semiconductor memory, such as random 
access memory (RAM), and disk drives. Depending on the 
embodiment, the memory 260 stores one or a combination 
of the following: an operating system 280, data acquisition 
65 module 282, in some embodiments training system data 22, 
vectors 286, the inductive learning module 24, the cluster 
database 26, the system monitoring module 72, in some 
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embodiments monitored-system data 28 and an annotated 
diagnostic cluster database 290. The operating system 280 
may be implemented by any conventional operating system 
such as UNIX®, WINDOWS®, and LINUX®, among oth- 
ers. 5 

The training system data 22 may typically comprise any 
of archived data 292, nominal data 294 and off-nominal data 
296. The inductive learning module 24 typically comprises 
a cluster generation module 302 that, for example, imple- 
ments the flowchart of FIG. 7, or alternately FIG. 8. The to 
inductive learning module also typically comprises a “gen- 
erate indexed-cluster module” 304 that, for example, imple- 
ments the flowchart of FIG. 10. The system monitoring 
module 30 typically comprises a cluster retrieve module 310 
that, for example, implements the flowchart of FIGS. 11 A 15 
and 11B. 

Various embodiments of the present inventive techniques 
are typically incorporated in the inductive learning module 
24, the cluster database 26 and the system monitoring 
module 30. Generally, the inductive learning module 24 and 20 
the system monitoring module 30, are tangibly embodied in 
a computer-readable device, carrier or medium, for example, 
memory 260, and are comprised of instructions which, when 
executed, by the processor 32 of the computer system 250, 
causes the computer system 250 to utilize the present 25 
invention. 

Various embodiments of the present invention may be 
implemented as a method, apparatus, or article of manufac- 
ture using standard programming and/or engineering tech- 
niques to produce software, firmware, hardware, or any 30 
combination thereof. The term “article of manufacture” (or 
alternatively, “computer program product”) as used herein is 
intended to encompass a computer program accessible from 
any computer-readable device, carrier or media. Those 
skilled in the art will recognize that many modifications may 35 
be made to this configuration without departing from the 
scope of the present invention. 

The exemplary computer system illustrated in FIG. 15 is 
not intended to limit the present invention. Other alternative 
hardware environments may be used without departing from 40 
the scope of the present invention. 

EXAMPLES 

A Hybrid Combustion Facility (“HCF”) at the NASA 45 
Ames Research Center is used to investigate the combustion 
properties of hybrid fuel formulations intended for use in 
rocket propulsion. A hybrid rocket is one in which the fuel 
is in a solid form and the oxidizer is in liquid form, a 
formulation that has several potential advantages over con- 50 
ventional solid or liquid fueled rockets. For example, hybrid 
fuels have increased safety in manufacturing, handling and 
in use. The fiiel is not volatile but, unlike conventional solid 
fuel rockets, hybrid fuel rockets can be throttled to alter the 
thrust of the rocket after ignition. The HCF is a scale-up of 55 
various bench tests of hybrid fuel rockets and is highly 
instrumented. Thus, HCF offers a good opportunity to test 
the Inductive Monitoring System (“IMS”) described herein 
with a realistic operational system and in comparison with 
other techniques for monitoring the performance (or 60 
“health”) of a complex system as part of an overall NASA 
program to examine various Integrated Vehicle Health Man- 
agement (IVHM) techniques. Numerous references to 
IVHM architectures and applications can be found on NASA 
and other websites, various conference proceedings includ- 65 
ing the paper entitled “Integrated System Health Manage- 
ment for Reusable In-Space Transportation Systems” pub- 
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lished in the Proceedings of the 54 th International 
Astronautical Congress of the International Astronautical 
Foundation, the International Academy of Astronautics and 
the International Institute of Space Law, 29 September-3 
October 2003, Bremen Germany, Ref. No. I AC-03 - 
IAA.13.1.02 (incorporated herein by reference). 

Prior to a firing of the Hybrid Combustion Facility, 
oxidizer stored in the liquid oxygen (“LOX”) tank is 
pumped through a vaporizer, where the oxidizer is gasified, 
and enters the gaseous oxygen (“GOX”) tank. Over a time 
interval, up to an hour, GOX flows into the GOX tank until 
the tank pressure reaches the required level for the desired 
mass flow rate and run duration. At this point, the LOX feed 
system is isolated from the GOX tank by closing a first 
shutoff valve between the vaporizer and the GOX tank. The 
operator enters the desired run setpoints into a control 
computer. These set points include parameters for control 
valve scheduling, ignition timing, desired delivery pressure 
and configuration information. After a firing countdown is 
completed, an upstream shutoff valve is opened. The result- 
ing GOX outflow chokes at an orifice (sonic nozzle) and 
continues into the combustion chamber. A short time later, 
the ignition system oxidizer and fuel flow are turned on and 
ignited by a spark. High temperature combustion products 
from the ignition system are injected into the combustion 
chamber to vaporize paraffin fuel, which mixes with the free 
stream oxidizer and the ignition products to ignite the 
paraffin fuel in a self-sustaining combustion reaction. As the 
GOX tank pressure decreases during the course of a firing, 
the control valve opens to maintain constant delivery pres- 
sure and constant mass flow to the combustion chamber. A 
Venturi in the GOX feed line measures the oxygen flow rate 
but is accurate only for steady state operating conditions. 
The orifice measures the mass flow rate more accurately 
during transients and also serves to isolate any pressure 
fluctuations in the combustion chamber from the feed sys- 
tem. A check valve located upstream prevents reverse flow 
of combustion gases from entering the GOX feed line. Two 
burst disks located downstream from the orifice and one 
burst disk located upstream from the GOX tank protect 
against over-pressurization. Pressure sensors are located at 
the GOX tank, at the orifice and at the combustion chamber. 
A high frequency pressure sensor for the combustion cham- 
ber and a differential pressure sensor are located at the 
Venturi. The GOX temperature is measured upstream from 
the orifice. 

IMS was used to build a monitoring knowledge base for 
the gaseous oxygen (“GOX”) delivery system on the HCF, 
as schematically depicted in FIG. 16. The primary function 
of the GOX system is to vaporize liquid oxygen (“LOX”) 
and to provide gaseous oxygen to the combustion chamber 
at an appropriate rate to sustain the HCF combustion pro- 
cess. A malfunction in the GOX delivery system would 
typically result in an unexpected flow rate for a given 
pressure and configuration of valve settings. For conve- 
nience in capturing flow rate data in the definition of the IMS 
data vector, sensor readings from two consecutive data 
samples (or “frames”) were combined into a single vector. 
As conveniently used in this example, each data vector has 
seven data elements as shown in FIG. 17, in which: 

“POV-4, Position” is the current position (in degrees 
open) of the shutoff valve (see FIG. 16). 

“PCV-6, Position” is the current position of the control 
valve (situated between the GOX tank and the combustion 
chamber). 

“PIT3, Pressure” is the current pressure of the GOX tank 
(typically in the range from 0 to approximately 3,000 psi). 
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“PT6, Pressure” is the feed pressure just upstream of the 
sonic orifice delivering GOX to the combustion chamber 
(typically in the range from 0 to approximately 3,000 psi). 

“PIT3-PT6, Difference” is the difference between PIT3 
and PT6. 5 

“PIT3, Change” is the change in GOX tank pressure PIT3 
since the last data sample. 

“PT6, Change” is the change in the orifice feed pressure, 
PT6, since the last data sample. 10 

Thus, the seven element data vector of FIG. 17 captures 
four current operating parameters, two parameters incorpo- 
rating changes from the previous data frame and one param- 
eter derived as the difference of two other parameters within 
the current data frame. Incorporating the pressure change 15 
(difference) between PIT3 and PT6 provides a normalization 
parameter that allows IMS to form more general behavior 
classes not depending on specific pressure values for PIT3 
and PT6. 

20 

While it is expected that other data vector definitions, 
other parameter choices, and the like, would also work with 
IMS, the data vector of FIG. 17 provided acceptable results 
for the particular example considered here. 

Fifteen data sets were collected from test firings of the 2 s 
HCF in which sensor data was collected and was sufficiently 
complete for use in training and testing IMS. No significant 
anomalies were detected in the performance of the GOX 
delivery system for any of the 15 test firings. Nine of the data 
sets used a larger orifice diameter on the sonic orifice than 30 
the other six. The nine “large orifice” data sets were used to 
train IMS and were defined to be “nominal” system perfor- 
mance. The six data sets obtained with the use of the small 
sonic orifice were then tested against the trained IMS. When 
properly classifying system health, these six “small orifice” 35 
data sets should be reported as “suspect.” 

Example 1 

The nine data sets collected with large orifice firings of the 40 
HCF (“nominal” data sets), were used to train the IMS. 
Vectors from three of the nine data sets were used as “basis 
classes” with each vector forming a nominal class contain- 
ing one member. Three additional data sets from the nine 
were used to expand the initial basis class definitions 45 
through interpolation. For most vectors in these sets, the 
previously-defined class that was closest to a vector was 
expanded to include the vector as well as any data values 
lying between the vector and the class. If a training vector 
was too far away from any of the previously defined classes, 50 
a new nominal class was formed containing that vector as its 
initial member. The class closest to a vector is defined as that 
class that would require the least amount of expansion in 
order to incorporate that vector. Vector distance (or required 
expansion) is measured as the sum of percent changes in 55 
each vector parameter that is required to include the new 
vector. For example, if the new training vector is {85, 23, 
2027, 1202, 825, 41,6} and a class was found containing the 
vector {85, 22, 2030, 1200, 830, 44, 8}, the difference 
between these two vectors would be {0, 1, 3, 2, 5, 3, 2}. The 60 
difference for each parameter is then divided by the range of 
possible values for that parameter to obtain a percentage 
change. The percentage changes thus obtained are summed, 
and the result is the “distance” between the new vector and 
the closest vector in the class. Once the class with the “least 65 
required expansion” (lowest percentage sum) is identified, 
the new vector and vectors with parameters falling between 
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the new vector and the closest vector in the class are added 
to the class (if it is within a specified maximum distance 
from the class.) 

The last training step in this example uses the three 
remaining data sets (from the nine) to estimate an upper and 
lower error bound for each data parameter. IMS identified 
the class that was “closest” to each of these three new 
training vectors. However, instead of including the new 
vector and its neighbors into the closest class, IMS adjusted 
a global weight to be added to or subtracted from each vector 
parameter when testing for class membership. The weight 
values are intended to compensate for inaccuracies in the 
sensor data and for the limitations of the training data sets. 
When a set of weights was found that classified all of the 
data sets in the third group as “nominal” data, the IMS 
training was complete. 

After training as described above, IMS was tested using 
two “nominal” (large orifice) data sets and four “off-nomi- 
nal” (small orifice) data sets that had not been included in the 
IMS training data. Thus, a total of 17 data sets were used, 9 
nominal data sets for training, 2 additional nominal data sets 
for testing and 6 off-nominal data sets for testing. The data 
vectors from each set were processed with the IMS moni- 
toring knowledge base, presented in the same order as they 
were collected from the HCF. IMS correctly identified the 
off-nominal data sets as suspect soon after the shutoff valve 
was opened. The IMS also correctly processed the two 
additional nominal data sets, finding that the data sequences 
were properly included in the nominal classes in the knowl- 
edge base. Therefore, we conclude that if IMS were installed 
in the HCF facility, alerts would have properly been sent for 
off-nominal data sets (but not for nominal), promptly alert- 
ing the operator or possibly initiating a system shut-down. 

Example 2 

In addition to actual sensor data collected from HCF 
firings, the IMS was tested on simulated data. Interval 
Constraint Simulator software (“ICS”) was developed at 
NASA Ames Research Center to provide a fast, flexible 
system modeling and simulation tool. Further description of 
ICS can be found in Attachment A, which is incorporated 
herein by reference. The ICS was used to simulate the HCF 
and to produce 1,200 simulated HCF runs. These simulated 
data sets were divided into three groups and used to train the 
IMS as in Example 1. However, these simulated data sets 
used to train the IMS and create the monitoring knowledge 
base did not reproduce the noise characteristics of actual 
HCF data collected by sensors. As a result the IMS trained 
on simulated data was not effective in monitoring actual 
(measured) data sets. 

IMS monitoring improved on actual data when three 
actual data sets (including noise) were added to the training 
set. The incorporation of actual data sets allowed IMS to 
incorporate data noise characteristics into the knowledge 
base and more faithfully to monitor the actual HCF perfor- 
mance. When this updated knowledge base (including data 
sets with actual noise) was tested on simulated data sets not 
used for IMS training, this updated knowledge base pro- 
vided monitoring results similar to that produced by the 
knowledge base trained on strictly simulated data. Thus, the 
addition of even a small number of actual system data to the 
IMS training sets improved IMS performance on actual 
system test data, but did not seriously degrade the perfor- 
mance of IMS when tested on simulated test data. 

Other approaches can be used to train IMS using simu- 
lated data and yet train IMS for monitoring actual operating 
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systems. For example, the acceptable deviations from nomi- 
nal values during system monitoring can be increased to 
account approximately for sensor noise. Additionally, wider 
tolerances can be used when expanding the clusters during 
IMS training to account for expected sensor noise. More 5 
realistic system simulations (including sensor noise) can be 
employed for IMS training, thereby providing a better match 
to actual system data profiles. 

This ability to train with simulated data enables IMS to 
produce useful monitoring knowledge databases for systems to 
lacking an extensive archive of actual performance data. In 
addition, the use of simulated data allows IMS to include 
information about previously unexplored system operating 
regimes in the monitoring knowledge base by simulating 
such regimes before an actual system run. 15 

Example 3 

IMS was trained with simulated data as in Example 2. All 
IMS training data that was used to construct the system 20 
monitoring knowledge base was derived from nominal 
simulations. Simulated test data was then generated includ- 
ing a system failure. In particular, a failure was injected near 
the end of the simulation that caused the shut-off valve 
POV-4 to stick open at 20 degrees instead of fully closing. 25 
When the data from the failure simulation was processed by 
IMS, the off-nominal system performance was detected 
within two data frames (0.2 simulated seconds) of the 
occurrence of the failure. Although IMS in the embodiment 
employed here did not identify the shut-off valve as the 30 
cause of the off-nominal behavior, rapid detection of an 
anomaly was achieved. 

Example 4 

35 

The IMS monitoring technique of these examples was not 
optimized for speed, although initial timing tests appear to 
be promising. A linear search was used to match input data 
with the classes of the monitoring knowledge base formed 
from the 1,200 simulated training data sets of Example 2. 40 
Data records were read sequentially from a disk file. Run- 
ning on a Sun Microsystems Blade 1000 workstation with a 
750 MHz processor, IMS achieved a processing speed of 
approximately 2,000 data records per second. A Sun Ultra 10 
with a 300 MHz processor was able to process approxi- 45 
mately 700 records per second. It is expected that IMS 
would be able to process at kilohertz data rates if the data 
acquisition interface were able to transfer data sufficiently 
rapidly. 

Although the above application of IMS to the HCF used 50 
three training data subsets and three training phases, other 
embodiments of IMS can also be employed. For example, 
some embodiments of IMS combine these three training 
steps into one pass through all training data, and do not use 
a division of training data into subsets. While it is expected 55 
that such embodiments of IMS will give similar system 
monitoring accuracy, some embodiments may be advanta- 
geous in terms of computational simplicity and/or training 
and monitoring speed. 


IMS is currently performing real-time system health 
monitoring for a UH-60 Blackhawk research helicopter at 
the NASA Ames Research Center. The two turbine engines 65 
on the helicopter are monitored by means of a separate IMS 
cluster database for each engine. Data is collected at a rate 
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of 4 Hz and transmitted over a MIL-STD-1553 Data Bus to 
the IMS computer on board the helicopter. The IMS data 
vectors include three time samples where each sample 
includes sensor values for: 1) Engine torque. 2) Stage- 1 
compressor speed. 3) Stage-2 compressor speed. 4) Fuel 
flow. 5) Rotor speed. Thus, five sensor values at three 
consecutive time samples result in 15 data values in each 
IMS vector. 

IMS was trained with data collected on ten helicopter 
flights and two cluster databases were formed for system 
monitoring, one for each engine. During system monitoring, 
the data from each engine is evaluated by IMS by means of 
the corresponding engine- specific cluster database in real- 
time as collected. The data is displayed on a video display 
on board the helicopter as well as stored for later analysis. 
To date, no actual in-service system anomalies have 
occurred. However, IMS was tested by means of a simulated 
system anomaly. Specifically, the fuel flow values were 
increased by 10% to 1 5% in one engine for a period of time. 
IMS detected this anomaly and displayed a significant 
“deviation from normal” value on the display. 

Although various embodiments which incorporate the 
teachings of the present invention have been shown and 
described in detail herein, those skilled in the art can readily 
devise many other varied embodiments that still incorporate 
these teachings. 

APPENDIX 

It is desired to show that any cluster in the database that 
is closer to the vector V than cluster Cc must include points 
that are a distance between (Dv+Cv) and MAX{0, (Dv- 
Cv)} away from the indexing reference point. Herein we 
assume (Dv-Cv) ^0. The case in which (Dv-Cv)<0 and is 
replaced by 0 proceeds by the same arguments. 

Recalling that Cv is the distance between the input vector 
V and cluster Cc, consider the region of (N dimensional) 
space containing the point defined by vector V and all points 
that are a distance from 0 to Cv away from V. We call this 
a “ball” around point V, B v . Recalling that Dv is the distance 
of point V from the indexing reference point, B v contains a 
point closest to the indexing reference point and a distance 
(Dv-Cv) or less from the indexing reference point. B v also 
contains a point farthest from the indexing reference point at 
a distance (Dv+Cv) from the indexing reference point. B v 
may or may not contain the indexing reference point. 

If Bv does not contain the indexing reference point, 
consider all points in space having a distance between 
(Dv-Cv) and (Dv+Cv) from the indexing reference point. 
We call this shell around the indexing reference point S z . S z 
contains all of the points of B v . Any point outside of S z . lies 
at a distance less than Dv-Cv, or greater than Dv+Cv, from 
the indexing reference point. Also, since S z contains all of 
B v , no point outside of S z will be in B v and, therefore, no 
point outside of S z will be closer to V than Cc. This means 
that any point less than a distance Dv-Cv or greater than 
Dv+Cv from the indexing reference point will be further 
from V than Cc. 

If B v contains the indexing reference point (that is, V is 
closer to the indexing reference point than it is to Cc), S z 
becomes a sphere around the indexing reference point with 
radius Dv+Cv as the inner radius of the S z shell collapses to 
zero. All points lying outside this sphere S z also lie outside 
the sphere B v so only points less than a distance Dv+Cv from 
the indexing reference point can be closer to V than Cc. 
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What is claimed is: 

1. A method of inductive learning comprising providing a 
computer that is programmed: 

to provide or receive training data, including at least one 
of archived data, simulated nominal data and off- 5 
nominal data; 

to provide vectors having a set of parameters determined 
from the training data; 

to generate a cluster database comprising clusters that are 
associated with respective ranges of values for at least to 
a subset of the set of parameters; 
to index the clusters of the cluster database based on an 
indexing distance of each of the clusters from a pre- 
determined indexing reference point; 
to organize the clusters into a data structure of clusters 15 
based on the cluster indexing; and 
to display a relationship between at least one of the 
vectors and the data structure in a visually perceptible 
format. 

2. The method of claim 1 wherein said process of gener- 20 
ating comprises: 

determining a separation distance between a selected test 
vector and one of said clusters, and 
producing a new cluster including the test vector, when 
the separation distance exceeds a threshold value. 25 

3. The method of claim 2, wherein said computer is 
further programmed to determine a deviation distance by 
dividing said separation distance between said test vector 
and said one of said clusters by a value representing a range 
of values of at least one variable in said one of said clusters, 30 
and to associate the deviation distance with a severity of a 
deviation of the at least one monitored-system vector from 

a nearest cluster. 

4. The method of claim 1 wherein said process of gener- 
ating comprises: 35 

determining a separation distance between a selected test 
vector and at least one of said clusters, and 
expanding the at least one cluster to include the test vector 
when the separation distance is less than or equal to a 
threshold value. 40 

5. The method of claim 4, wherein said computer is 
further programmed to determine a deviation distance by 
dividing said separation distance between said test vector 
and said one or said clusters by a value representing a range 
of values of at least one variable in said at least one of said 45 
clusters, and to associate the deviation distance with a 
severity of a deviation of the at least one monitored-system 
vector from a nearest cluster. 

6. A method of monitoring a system comprising providing 

a computer that is programmed: 50 

to provide or receive a cluster database comprising clus- 
ters that are associated with respective ranges of values 
for at least a subset of a set of cluster parameters; 
to receive at least one monitored-system vector having 
monitored-system parameters, with parameter values 55 
generated by sensors that provide data measured on a 
monitored system; 

to determine whether the at least one monitored-system 
vector is contained in any of the clusters based on at 
least a subset of the monitored-system parameters and 60 
the subset of the cluster parameters; and 
when at least one of the monitored-system vectors is not 
contained in any cluster, to determine a deviation 
distance of the at least one monitored-system vector 
from a nearest cluster, to associate the determined 65 
deviation distance with a severity of a deviation of the 
at least one monitored-system vector from the nearest 
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cluster, and to display in a visually perceptible format 
at least one deviation distance for the parameter values 
for the at least one monitored-system vector from the 
corresponding parameter values for the nearest cluster. 

7. The method of claim 6, wherein said computer is 
further programmed: 

to provide an additional database of clusters, associated 
with respective ranges of values for at least a subset of 
said set of parameters, the additional cluster database 
being annotated with diagnostic information; and 
when at least one of said monitored-system vectors is not 
included in any cluster, to compare at least one of said 
monitored-system vectors with at least one of the 
clusters of the additional cluster database. 

8. An apparatus for inductive learning comprising a 
computer that is programmed: 

to provide or receive training data, including at least one 
of archived data, simulated nominal data and off- 
nominal data; 

to provide at least one vector having a set of parameters 
based on said training data; and 
to generate a cluster database comprising clusters asso- 
ciated with selected ranges of values for at least a 
subset of the set of parameters; 
to index the clusters of the cluster database based on an 
indexing distance of each of the clusters from a pre- 
determined indexing reference point; 
to organize the clusters into a data structure of clusters 
based on the cluster indexing; and 
to display a relationship between at least one of the 
vectors and the data structure in a visually perceptible 
format. 

9. The apparatus of claim 8, wherein said process of 
generating comprises: 

determining a separation distance between a test vector 
and one of said clusters, and 
producing a new cluster if the separation distance exceeds 
a threshold value. 

10. The apparatus of claim 9, wherein said computer is 
further programmed to determine a deviation distance by 
dividing said separation distance between said test vector 
and said one or said clusters by a value representing a range 
of values of at least one variable in said one of said clusters, 
and to associate the deviation distance with a severity of a 
deviation of the at least one monitored-system vector from 
a nearest cluster. 

11. The apparatus of claim 8 wherein said process of 
generating comprises: 

determining a separation distance between a test vector 
and at least one of said clusters, and 
expanding the at least one of said clusters to include the 
test vector when the separation distance is less than or 
equal to a threshold value. 

12. The apparatus of claim 11, wherein said computer is 
further programmed to determine a deviation distance by 
dividing said separation distance between said test vector 
and said one or said clusters by a value representing a range 
of values of at least one variable in the at least one of said 
clusters, and to associate the deviation distance with a 
severity of a deviation of the at least one monitored-system 
vector from a nearest cluster. 

13. An apparatus for monitoring a system, comprising a 
computer, having a memory storing a cluster database 
comprising clusters, associated with respective ranges of 
values for at least a subset of a set of cluster parameters, 
where the computer is programmed: 
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to provide or receive one or more monitored-system 
vectors having monitored-system parameter, with 
parameter values generated by sensors that provide data 
measured on a monitored system; 
to determine whether the monitored-system vector is 5 
contained in any of the clusters based on at least a 
subset of the monitored-system parameters and the at 
least a subset of cluster parameters; and 
when at least one of the monitored-system vectors is not 
contained in any cluster, to determine a deviation 10 
distance of the at least one monitored-system vector 
from a nearest cluster, to associate the determined 
deviation distance with a severity of a deviation of the 
at least one monitored-system vector from the nearest 
cluster, and to display in a visually perceptible format 15 
at least one deviation distance for a parameter value for 
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the at least one monitored-system vector from a corre- 
sponding parameter value for the nearest cluster. 

14 . The apparatus of claim 13 , wherein said computer is 
further programmed; 

to provide an additional database of clusters that are 
associated with respective ranges of values for at least 
a subset of said parameters, the additional cluster 
database being annotated with diagnostic information; 
and 

when at least one of said monitored-system vectors is not 
included in any of said clusters, to compare said at least 
one of said monitored-system vectors with the clusters 
of the additional cluster database. 



